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Abstract. In this paper we describe an approach to finding the shortest 
reset word of a finite synchronizing automaton by using a SAT solver. 
We use this approach to perform an experimental study of the length of 
the shortest reset word of a finite synchronizing automaton. The largest 
automata we considered had 100 states. The results of the experiments 
allow us to formulate a hypothesis that the length of the shortest reset 
word of a random finite automaton with n states and 2 input letters with 
high probability is sublinear with respect to n and can be estimated as 



1 Introduction 

A deterministic finite automaton (DFA) is a triple A = {Q,IJ,6), where Q is a 
set of states, E is an input alphabet, and 5 : Q x E ^ Q is & transition function 
defining an action of the letters in on Q. We use a common concise notation 
denoting S{. . . S{S{q, ao), ai), . . . Uk) by qao . . . Ofc. 

A word w e S* is said to be a reset word for a DFA A if its action leaves 
A in one particular state no matter what state it starts at: qiw = (I2W for all 
qi, q2 G Q- A DFA A is called synchronizing if it possesses a reset word. In this 
paper we describe results of an experimental study of the length of the shortest 
reset word of random automata. 

It can be easily shown that if an automaton with n states is synchronizing 
then it has a reset word of length less than n^. However, the tightness of this 
bound is far from obvious. In 1964, Cerny formulated a conjecture concerning 
the upper bound of the length of the shortest reset word of a synchronizing 
DFA [5]: the length cannot be larger than {n — 1)^. By now the Cerny conjecture 
is arguably the longest standing open problem in the combinatorial theory of 
finite automata. The tightest upper bound that has been obtained so far is 
(n^ - n)/6; it was proved by Pin [14] in 1983. 

Though no bound better than cubic has been proven for the shortest re- 
set word, most naturally occurring automata have reset words of subquadratic 
length. Automata with reset word of length 0{n'^) are considered to be excep- 
tional. For a long time the only infinite series of such automata was the one 
proposed by Cerny [5]. The other substantially different ones |ll2j have only 
recently been constructed. 



There are several theoretical and experimental results that support the state- 
ment that most synchronizing automata have a relatively short reset word. First, 
Higgins [101 has shown that the composition of 2?! random mappings of a set 
of size n into itself with high probability (whp) is a mapping with an image of 
size 1. (By "high probability" we mean that the probability tends to 1 as n goes 
to infinity.) In terms of automata, Higgins's result means that a random automa- 
ton with an alphabet of size larger than 2n whp has a reset word of length 2n. 
Indeed, if we pick an automaton uniformly at random among all automata with 
n states and 2n letters, then the action of a word composed of all the letters is 
identical to a mapping composed of 2n random mappings. Later it was shown 
[16] that a random automaton with n states over an alphabet of size n^-^+'^ has 
a reset word of quadratic length with high probability for any e > 0. 

The probability distribution of the length of the shortest reset word of a 
random automaton can be studied experimentally for small n. It is unlikely 
that there is a polynomial algorithm that can find the shortest reset word in 
general case because the problem belongs to FP^^['°sl [T3|, which means that 
the problem is both NP-hard and co-NP-hard. Moreover, approximating the 
length of the shortest reset word has also been shown to be hard [5]. Nevertheless, 
it is possible that the problem restricted to a certain class of automata (for 
instance see [?]) or to random automata is easy and can be successfully solved by 
an appropriate heuristic. Recently, Roman |15j has developed a genetic algorithm 
for finding a short reset word and in particular, applied it to random automata. 
In this paper we present the results of applying of SAT solvers to the problem 
of finding the shortest reset word. 

SAT (or Boolean Satisfiability) is a combinatorial problem of finding a boolean 
assignment that satisfies a given boolean formula in conjunctive normal form. 
SAT was one of the first problems proven to be NP-complete [6]. The devel- 
opment of practical algorithms for solving instances of SAT (so called SAT- 
solvers) is an area of active research and there is a regular competition of these 
algorithms. These days the problems that participate in SAT competitions have 
hundreds of thousands of variables and millions of literals. This is especially 
surprising when one recalls that SAT is NP-complete. This observation does not 
formally contradict the NP-hardness of SAT, but shows that hard instances of 
SAT rarely occur in practice. There are various approaches to explaining this 
phenomenon in greater detail |11|7|4| . 

SAT is also known to be a natural language for a variety of combinatorial 
problems. In this paper we show that the problem of finding the shortest reset 
word of a finite automaton can be naturally reduced to a few SAT instances. 
We apply a SAT solver to those instances and recover the reset word from the 
resulting boolean assignment. 

As mentioned, Roman |15j was using a genetic algorithm to find a reset word 
of random automata. Since genetic algorithms are incomplete, the results of |15) 
allow one to assume only an upper bound on the length of the shortest reset 
word. It turns out that even for an alphabet of size 2 as the number of states 
grows, the probability of the automata being synchronizing approaches 1. In this 



paper we also study automata over a 2-letter alphabet. It is easy to see that if 
the size of the alphabet gets larger, the length of the shortest reset word of a 
random automata decreases. 

We were able to find the shortest reset words of randomly generated au- 
tomata with up to 100 states. We argue that the results of our experiments are 
a reasonable basis for the hypothesis of the length of the shortest reset word of 
a random automaton. The hypothesis is given in the following formula: 

£{n) « 1.95n°•5^ 

where n is the number of states of the random automaton and £{n) is the length 
of the shortest reset word. 

The rest of the paper is organized as follows. In Section 2, we describe how 
the problem of finding the shortest reset word can be reduced to a collection 
of instances of SAT. In Section 3, we formally define the notion of a random 
automaton. In Section 4, we present results of experiments and what we believe 
they mean. We conclude in Section 5 with a short discussion. 

2 Solving Automata Synchronization Problem via 
Reduction to SAT 

Given a finite automaton A = {Q, {a, b}, S) and an integer c, we build a 3-CNF 
formula such that is satisfiable if and only if A has a reset word w of 
length c. We denote the prefix of w of length t by 
The formula (p'^ contains two types of variables: 

— For each t £ 1, . . . , c, we introduce a variable Ut- Setting ut to true is inter- 
preted as "the t-th letter of w is a" and setting Ut to false is interpreted as 
"the t-th. letter of w is b" . 

— For each q £ Q and t G {0, . . . , c}, we introduce a variable Xqf A variable 
a;qo is used to mark whether an automaton can be initially in a state q or 
not. When t 0, setting Xqt to false is interpreted as "there does not exist 
a state u such that uiu\i,,,t = q". It is convenient for us to interpret setting 
Xqt to true as "there may exist a state u such that uw\i,,,t = q". In other 
words, we will enforce setting x^t to true and will not enforce false. 

There are c variables of the first type and {c+l)n variables of the second type. 
Therefore the resulting boolean formula contains {c+ l)n -|- c boolean variables. 

There are also three types of clauses in (j)'^: 

— For each q G Q we assert that initially the automaton can be in this state 
by adding a one literal clause 

a;qo- 

— For each q g Q and t G {0, . . . , c — 1} we add the following elementary 
disjunctions to 



Note that these disjunctions are equivalent to the following implications: 



(1) 
(2) 



The clauses of the first and the second types together enforce setting Xqt 
to true if and only if the state q can be achieved from some state of A by 
applying the prefix w\i,,,t- 
— For each 2-element subset {p, q} C Q, where p 7^ q we add the following 
elementary disjunctions to t/)^: 



The clauses of the third type ensure that at most one of the variables Xqc 
may be true. 

If w is a reset word of length c for A, then the formula (p'^ is satisfiable. 
Indeed the satisfying assignment is obtained as follows. Values of the variables 
Ui, . . . ,Uc are determined by reading the word w and setting ut to true or false 
according to the value of the t-th letter of w. Then we assign Xqo = true for 
all q € Q. Next, for each t = 1, . . . , fc and for each q e Q, we assign Xqt to 
true if it must be done to satisfy some clause of type ([T|) or ([5]). Otherwise, we 
assign Xqt to false. It is easy to see that after such an assignment for any t and 
q we have Xqt equal to true if and only if q = uti;|i...t, for some u e Q. Since 
UI is a synchronizing word, all clauses of type ([3|) are satisfied. Analogously, if 
the formula 0^ is satisfiable, then the values of the variables ui, . . . ,Mc in the 
satisfying assignment define a word w of length c, and the fact that all clauses 
of (/)^ are satisfied implies that w is a reset word. 

There are n clauses of the first type, 2cn clauses of the second type and 
"^"'^'f clauses of the third type. In total we have + n(2c + 1) clauses. 

Clauses of the first type have one literal, clauses of the second type have three 
literals and clauses of the third type have two literals each. Therefore the formula 

contains + 6cn literals in total. 

Thus, we can use a SAT solver to answer the question, 

"Can A be synchronized by a word of length c?" 

We use MiNiSAT solver [5] to find the solution to this problem. SAT al- 
gorithms development is a very active research area and each year new solvers 
win the competition. MiNiSAT was developed in 2003 and has become a state- 
of-the-art algorithm since then. The algorithm is relatively simple and yet very 
efficient — its performance is comparable to the best present day solvers. In 
some years, the SAT competition has a specialized MiNiSAT-hack tournament. 
For more details on the algorithm see [819] . 

Once we have an algorithm that can check whether there is a reset word 
of given length we can find the length of the shortest reset word by performing 
binary search (see Fig.[T]). Note that there is a polynomial algorithm for checking 
whether A is synchronizing [5]. Thus, we use the algorithm in Fig. [Ijonly for 
synchronizing automata. 
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Input: Synchronizing n state automata A, we assume that n > 
Output: The shortest reset word of A 

r = n^ 
I = 

while True: 

# Note: we use integer division in the next formula. 

r. — (+r 
^~ 2 

if c == /: 

return Synchro Word(r) 
if SynchroWord(c) is not None: 

r = c 
else: 

I = c 



Fig. 1. Binary search applied to the shortest reset word problem. We assume 
that we have a function SynchroWord(c) that returns a reset word of length less 
than or equal to c if such word exists and returns None otherwise. 



3 Random Automaton 

In the experimental section we study the length of the shortest reset word of a 
random automaton over a 2-letter alphabet. Formally, Random Automaton A(n) 
with n states over an alphabet S can be defined as a discrete probability space 
(J7a, P), where sample space f^A is the set of all automata over S. To define a 
specific automaton A = {Q, S, 5) one needs to define (5(q, a), for each q € Q and 
a & E. Thus, it is easy to see that |1?a| = n'^'". We set the probability of all 
elements of the sample space to be equal, and consequently for all A we have 
P{A) = n~l'^l". We also consider a probabilistic space Random Synchronizing 
Automaton A'{n). Formally, A'(n) is defined as a probabilistic space induced by 
A(n) on the event is synchronizing" . 

The length of the shortest reset word i{n) is a random variable over the 
probabilistic space A'(n). To study the behaviour of the random variable £{n) 
as n tends to infinity we define the expectation of £(n) by r(n) and the variance 
of £{n) by d{n), that is 

r(n) = E (£(n)) , 
d{n) = V {e{n)) . 

Note that while £{n) is a random variable for each n, the functions r(n) and 
d(n) are deterministic. 



4 Experimental Results 



We performed a series of experiments for different n, where n is the number of 
states in the automaton. For a given n, the experiment consists of the following. 



We generate a random automaton with n states. Then we check whether this 
automaton is synchronizing and if so, we find a reset word for this automaton 
using the algorithm described in Fig.[TJ Then we record the rcsuh of synchroniza- 
tion, i.e., whether the automaton is synchronizing and the length of the shortest 
reset word. 
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Fig. 2. Probability distribution of £(50). 



For a specified number of states n, we performed a number of such ex- 
periments. The larger n is the more time it takes to solve the problem of 
finding the reset word, so for larger n we performed fewer experiments. For 
each n S {1, 2, . . . , 20, 25, 30, . . . , 50} we performed 2000 experiments, for each 
n G {55, 60, 65, 70} we performed 500 experiments and for n G {75, 80, ... , 100} 
we performed 200 experiments. In our experiments we used a personal computer 
with an Intel(R) Core(TM)2 Duo P8600 2.4GHz CPU and 4GB of RAM. The 
program for calculations was written in Java. The average calculation time was 
2.7 seconds for n = 50 and 70 seconds for n — 100. 

Thus, for each value of n participated in experiments we have an approxi- 
mated probabilistic distribution of £{n) and an estimated probability of the event 
"A(n) is synchronizing". In Fig. [5] we show the distribution of -^(50). 

4.1 Synchronization of A 

The larger n is, the larger the fraction of generated random automata that 
are synchronizing. For n = 100 only 1 out of 200 automata that we generated 



happened to not be synchronizing. Thus, we conclude that it is hkely that 
P("A is synchronizing") — > 1. 

4.2 Expectation of i{n) 

It appears that the function r{n) fohows a certain trend. To check whether the 
dependence of the mean value of the distribution £{n) follows a power law, we 
plot the graph in log/log space in Fig. |31 From the graph we conclude that it is 
a combination of some effects that are present for small n and an affine function 
that is obeyed for large n. To extract the behaviour of A for large n, we ignore 
data points for n < 20. We use the least squares method to find an affine function 
that best reflects the dependency of log(r) on \og{n). We find that 

log(r(n)) « 0.55 log(n) + 0.67. (4) 

Taking the exponent of both sides of (jj]) we obtain the equation 

r{n) w 1.95n°-5^ (5) 

In Fig. m we plot the graph of r versus n and the curve given by ([5]). It 
is interesting to note that the obtained approximation starts to fit the data at 
n = 17, approximating some data points that were not used in training. 
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Fig. 3. The graph of the logarithm of the number of states of automata n versus 
the logarithm of the length of the shortest reset word r. 



Mean length of the shortest reset word 

25| , , , 



20 



15 



10 - 



5 




°0 20 40 60 80 100 

Number of states: n 

Fig. 4. The graph of the mean length of the shortest reset word versus the 
number of states of the random automata and a power function approximating 
it. 



4.3 Variance of £{n) 

Recall that we denote variance of £{n) by d(n). Our experiments show that as n 
grows, d{n) also grows. But what is more interesting to look at is the behaviour 



of the function 



/d{n) 



In Fig. [5l we show the graph of 



/d{n) 



which appears 



r(n) ' ^' ' — ' ^ ^ r(n) 

to tend to as n goes to infinity. Below we discuss what that means for the 
distribution of £{n). 

The Chebyshev inequality reads (we omit the parameter n for conciseness): 

VM > P{\i -r\> AlVd) < 



and after transformation we have 
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so there exists M{n) 
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Therefore, we have 
In other words, with high probability i{n) is approximately equal to r{n). 



P{e{n) = r{n) + o{r{n))) — > 1. 
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Fig. 5. The graph of "^^^.^ appears to tend to as n goes to infinity. 
5 Conclusion and Discussion 

We interpret experimental results as indicating that as n goes to infinity, a 
random automaton is synchronizing with high probability. Also with high prob- 
ability the the length of its shortest reset word can be computed as 

£{n) « 1.95n°-^^ (7) 

In particular, we believe that the experimental data we obtained suggests that 
the length of the shortest synchronizing word of a random automaton is sublinear 
with respect to the number of states. 

It worth noting that our conclusion ([7]) directly contradicts a conjecture that 
Roman formulated in [15]. Namely, Roman conjectured that the mean length of 
the shortest reset word for a random n-state synchronizing automaton is almost 
equal to 0.486n. Roman's experiments with random automata consisted of two 
parts: for each n = 5, 6, . . . , 14 one thousand random n-state automata were 
generated and then for each n = 15, 16, ... , 100 ten random n-state automata 
were generated. The linear estimate £{n) « 0.486a; -|- 1.654 was suggested on 
the basis of the results of the first part of the experiments and then it was 
extrapolated even though the reported results of the second part did not really 
support the extrapolation. In contrast, we believe that both our and Roman's 
experiments with larger n indicate that a random automaton is synchronized by 
a word of length sublinear with respect to the number of states. 

We are also aware of another series of experiments with random automata 
synchronization performed by Gusev (these experiments are mentioned in [1]). 



A direct comparison of our results with those by Gusev is impossible because 
he used a different random automata model. However, on a qualitative level our 
conclusions tend to quite agree with Gusev's. 
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